Automatic Speaker Recognition Using Msvq-coded Speech
نویسنده
چکیده
Low bitrate speech coding nds application in both telecommunications (band-width compression) and archival ((le compression). Speaker veriication is used in telecom-munication applications (to gain access to particular services, for example) and implies that either or both of the speech data streams (incoming and reference) may be compressed. In this paper, we investigate the eeect of high compression methods on the eeectiveness of automatic speaker identiication and veriication. Lossy compression of the speech (whether transmitted or stored) requires vector quantization of the short-term spectral parameters in order to achieve high compression ratios, and thus implies some loss of accuracy in the representation of these parameters. However, in the situation where the same spectral parameters are utilized in identifying the speaker, the identiication accuracy may be compromised by the compression process. We present in this paper our ndings on the eeect of compression on identiication, for one particular family of vector quantization methods. PROBLEM FORMULATION In considering the evaluation of the eeect of spectrum compression on speaker identiication, four possible scenarios arise as shown in Table 1. These are :-(i) The \benchmark" for all cases, using \raw" speech in the identiication process. No compression is performed. (ii) The speech database is compressed (for example, on CD-ROM) and the incoming speech is available in uncompressed form. (iii) The incoming speech is compressed, but the reference is not. This arises in telecommunications applications. Note that in this case the speaker identiication parameters may be pre-computed and stored (depending on the identiication algorithm), allowing the speech database to be compressed. (iv) Both the existing database and the incoming speech are compressed. Case (ii) is studied in this paper, and is illustrated in Figure 1. This situation arises in forensic speech processing where the database of suspects has been archived and a new suspect is to be compared. It is assumed that the distance D coded is available, and the distance D uncoded is not available. A Vector Quantization (VQ) scheme is designed for the speech spectral parameters, and two methods of speaker identiication are examined: the Mahalanobis distance and the log-probability derived from a Multivari-ate Gaussian Mixture Model (GMM). Two families of VQ method which are known to achieve high compression are studied: multistage VQ and split VQ.
منابع مشابه
Speaker recognition from coded speech in matched and mismatched conditions
We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments use standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss in recognition performance for toll quality speech...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Automatic speaker recognition on a vocoder link
Automatic speaker recognition on a vocoder link has rarely been explicitly tested. In this paper, we show how the automatic speaker recognition could be used on a vocoder link. In a first experiment where we consider the “coder-link-decoder” speech system as a black box, a classic speaker recognition method (applied on the reconstructed speech) is shown to be able to provide an objective measur...
متن کامل